This data was downloaded from Albemarle.org. 1 All analyses were performed using the R statistical computing environment (R Core Team 2024), using version 4.3.3 on 2024-04-03.
Below is a summary of the data, courtesy of the {modelsummary} package (Arel-Bundock 2022).
modelsummary::datasummary_skim(homes, type = "numeric")
| Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
|---|---|---|---|---|---|---|---|---|
| lotsize | 7755 | 0 | 4.6 | 21.3 | 0.0 | 0.6 | 1067.2 | |
| totalvalue | 10750 | 0 | 582692.9 | 495109.1 | 7600.0 | 464900.0 | 16272400.0 | |
| lastsaleprice | 6416 | 1 | 297446.8 | 656670.5 | 0.0 | 220000.0 | 58000000.0 | |
| yearbuilt | 229 | 0 | 1984.1 | 37.7 | 1668.0 | 1991.0 | 2024.0 | |
| yearremodeled | 78 | 91 | 2004.8 | 14.9 | 1901.0 | 2007.0 | 2023.0 | |
| finsqft | 4184 | 0 | 2114.1 | 983.3 | 144.0 | 1924.0 | 19776.0 | |
| bedroom | 13 | 0 | 3.4 | 0.9 | 0.0 | 3.0 | 18.0 | |
| fullbath | 14 | 0 | 2.4 | 1.0 | 0.0 | 2.0 | 14.0 | |
| age | 229 | 0 | 39.9 | 37.7 | 0.0 | 33.0 | 356.0 | |
| remodeled | 2 | 0 | 0.1 | 0.3 | 0.0 | 0.0 | 1.0 |
Median total value and lot size by high school district.
stats <- aggregate(cbind(totalvalue, lotsize) ~ hsdistrict,
data = homes, median)
knitr::kable(stats, col.names = c("High School District",
"Median Home Value",
"Median Lot Size"),
align = "lcc")
| High School District | Median Home Value | Median Lot Size |
|---|---|---|
| Albemarle | 429800 | 0.2788 |
| Monticello | 446300 | 1.0330 |
| Western Albemarle | 574800 | 1.3990 |
The current real estate tax rate in Albemarle County is $.854 per hundred of assessed value 2. For example, a home valued at $400,000 will owe the following:
\[ \frac{\$400,000}{100} \times \$0.854 = \$3,416 \]
Plot of total value versus finished square feet using ggplot2 (Wickham 2016).
ggplot(homes) +
aes(x = finsqft, y = totalvalue) +
geom_point() +
scale_x_log10() +
scale_y_log10() +
facet_wrap(~hsdistrict) +
labs(caption = "Note: axes on log10 scale.")
Over 18% of homes in the Western Albemarle school district are assessed at over $1,000,000.
We fit two models with finished square feet and high school district as predictors. The latter model includes an interaction.
| model 1 | model 2 | |
|---|---|---|
| (Intercept) | 4.571 | 5.647 |
| (0.034) | (0.061) | |
| log(finsqft) | 1.122 | 0.979 |
| (0.004) | (0.008) | |
| Monticello | -0.008 | -1.467 |
| (0.004) | (0.082) | |
| Western Albemarle | 0.103 | -1.524 |
| (0.005) | (0.085) | |
| log(finsqft):Monticello | 0.194 | |
| (0.011) | ||
| log(finsqft):Western Albemarle | 0.215 | |
| (0.011) | ||
| Num.Obs. | 32502 | 32502 |
| R2 | 0.675 | 0.679 |
| R2 Adj. | 0.675 | 0.679 |
| AIC | 871946.3 | 871508.5 |
| BIC | 871988.3 | 871567.3 |
| Log.Lik. | -10699.623 | -10478.719 |
| F | 22464.295 | 13751.153 |
| RMSE | 0.34 | 0.33 |
Visualize the model using the ggeffects package (Lüdecke 2018).
hist(homes$age[homes$hsdistrict == "Albemarle"], main = "", xlab = "Age")
hist(homes$age[homes$hsdistrict == "Monticello"], main = "", xlab = "Age")
hist(homes$age[homes$hsdistrict == "Western Albemarle"], main = "", xlab = "Age")